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A study was conducted to determine to what extent 
college teacher-made multiple-choice tests contain test-wiseness 
clues that can be used to identify correct answers. A sample of 43 
undergraduate teacher-made examinations was obtained from two 
colleges and three universities* The sample included midterm and 
final examinations and quizzes. The tests were written by 36 faculty 
members, including 9 assistant professors, 23 associate professors, 
and 4 full professors. A total of 1,220 multiple-choice questions 
were evaluated based on 10 test-wiseness criteria {Millman, et al., 
1965). It was found that 44 percent of the items contained a 
test-wiseness clue, and 70 percent of these items could be answered 
correctly by applying a clue. The clue discovered most often was 
"direct opposites" (i.e, writing an alternative directly opposite to 
the correct answer). The overall most successful clue was "key word 
association." It is suggested that college teachers need to consider 
how to avoid these clues when developing examinations. Implications 
for college personnel responsible for student academic improvement 
and faculty development are addressed. Explanations of the 10 
test-wiseness criteria are included, along with information on the 
data analysis procedures* (sw) 
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Counseling services on campus responsible for academic skills 
training and development have long paid considerable attention to 
test anxiety* Its debilitating effects on test performance have 
been well documented (Sarason, 1972; Sarason, 1975; Wine, 1971)* 
Related to the emotional problem of test anxiety is the measure- 
ment concern of test reliability* When tests are unreliable, 
teechers are given spurious estimates of their students 1 abilities* 
In the case of the sufferer of test anxiety, hisA«r true ability 
may be seriously underestimated due to an inordinate fear of tests* 
Student anxiety, however, is only one factor that can negatively 
affect test reliability* Another source of unreliability in testing, 
recognized by measurement experts, comes in the form of unintentional 
clues to correct answers inherent in the instrument itself* 

Since the introduction of the concept of "test-wiseness" ar*d 
its effects on test reliability (Thorndilce, 1951), efforts have 
been made by producers of commercially developed stsndardited tests 
and conscientious faculty to reduce clues to correct answers within 
their instruments* Test~wiseness has been defined as a student's 
capacity to utilize the characteristics and formats of the test 
and/or the test-taking situation to receive a high sccre (Millman, 
et* al*, 1965)» Although test-wiseness clues have been a* concern 
of many in higher education, test-wise students still tend to do 
better than their unsophisticated peers on both standardized and 
teacher-made examinations (Bajtelssit, 1977; Brunner, 1976; Flynn 
k Anderson, 1973; Kirkland h Hollandsworth, 1979; Langer, Wark k 
Johnson, 1973)* 
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Millnan, Bishop and Ebel (1965) developed a set of test-, 
wiseness principles that have been used to help students increase 
their scores on multiple-choice examinations * These prinoiples 
provide the test-taker with inf ormetion on how to select the correct 
alternative excluding the aee of content aroa knowledge* They have 
been promoted in various how-to->3tudy books written for students 
(Brozo k Schnelzer, in press; Morgan k Deese, 1969; Pauk, 1974j 
Schmelxer, et* al** 1980), as well as in materials specifically 
designed to teach students test-wiseness (Uillman 6 Pauk, 1969; 
Sherman k Wildman, 1982; TYark, et* al*, 1972; TToodley, 1978)* 

Based on the knowledge thet test-wiseness does exist and that 
its application seems to have a significant influence^onvtest scores, 
a study was conducted in an attempt to answer the following questions 
To what extent do current teacher-made nultiple*choice tests at the 
college/university level contain tost-wiseness clues which can be 
used to identify correct answers? 

The study was limited in scope and, therefore, the authors make 
no sweeping claims about the general ability of all faculty members 
to construct reliable tests* Nevertheless, it was felt that the 
research might serve as a model for other academic assistance per* 
sonnel interested in discovering the extent to which test-wiseness 
clues are evident in faculty-constructed multiple-choice examinations 
on their campuses* What is ciore* if test-wiseness clues to correct 
answers are found in significant numbers, efforts to bring*about 
more reliable measurements of classroom learning should go in two 
directions* First, progress should be nade available for improving 
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the faculty's ability to construct multiple-choice tests tthich are 
free of clues to correct answers; second, testvwiseness should be 
taught to unsophisticated test-takers and test anxious students which 
may help them achieve higher scores. 

3/ITHOD 

Materials 

A sample of forty-three undergraduate teacher^aade examinations 
was obtained from two colleges and three universities. Both colleges 
(one contributed eight test, the other contributed seven tests) and 
two of the universities (one contributed ten tests, the other con- 
tributed six tests) were located in the Southeastern region of the 
U.S*. with one laejor university (contributing 12 tests) from the tfest 
coast. Included in the sample were nid-terx and final examinations, 
and quities. The t^sts were fron the follcv-ing content areesi 
business (7); education (5); English (6)j geology (3); history (6)i 
health education (6); nursing (6); psychology (2). The tests rep- 
resented the iters writing efforts of 36 faculty members, including 
nine assistent professors, 23 associate professors and four full 
professors. 
Procedure 

A total of 1,220 nultiple-choice questions were evaluated accord- 
ing to the following test~wiseness criteria (tfiilman, et. al. , 1965): 

1. The longest alternative is usually correct. Operationally, 
this was defined as an alternative having at least one full line of 
print more than the other alternatives* 
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2* The alternative of middle value is usually correct* Oper* 
ationally this meant that when given a list of alternatives which 
contained information that could bo ordered from large to small* 
greater to lesser* the alternative between the extremes would most 
likely be correct* 

3* When an alternative contains a key word that is included 
in the stem* it is usually correct. Operationally this meant that 
a key wcrd in the stem was readily associated with the same word or 
a synonym in the alternative* 

4* When an alternative contains a specific determiner that 
allows for exceptions, it is usually correct* Such determiners *\ 
include the words: often, perhaps, seldom, generally, may, usually, 
sometimes* 

5* When two alternatives are directly opposite in meaning, one 
is usually correct. To illustrate this criterion* notice the differ* 
ence between these twe alternatives t "decrease as pupils advance 
through school, 1 "increase as pupils advance through school** 

6* If the other alternatives are specific in nature and one is 
general in neture, than the cost general alternative is usually correct* 
In the following example, alternative *B n is wore general than the 
ethers. Shakespeare's tragedies* A) deal excljsively with "the English 
monarch? who ruled between 600-1200 (B) deal with the English icings 
who reigned before the 17th century (C) deal vith monarchs of the 
Tudor line only (D) were written within the first year of his play- 
writing career* 

7. If only one alternative is graraetically consistent with the 
stem, it is correct* For instance, a stes that reads "Important for 
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blood regulation ore** requires the correct alternative to to plural 
in nature* If all ether alternatives are singular in nature, than the 
plural alternative rajst be correct because it agrees grammatical ly 
with the stem* 

A search was made of three other clues within the 43 teats. The 
following clues* though not discussed by Uillr.an, were included 
because of their widespread acceptance by students and teachers of 
te&t-wiseness* 

8* TThen ia doubt, choose .the alternative "all of the above,** 
9* When most of the test is comprised of fcur alternative itesis 
with some items containing five alternatives* select the fifth alter- 
native* 

10* When in doubt, select alternative **C.** 

Only these clues were considered in this study. Others were 
excluded because of the difficulty in developing a workable oper* 
ational definition that could be utilized by the researchers* 

Tetta ttere divided aciong the three researchers and each test ite:a 
was evaluated according to the test-*iseness criteria above* TThen 
there was doubt whether a test iten fit the criteria for a particular 
tefit*wiseness clue* it was presented to the other researchers and was 
discarded if one or ir-orc of the researchers disagreed* 
Data Analysis 

Data were analyzed in the following manner: 

1* Items that met the criteria for a test-viseness clue were 
categorired by the clue and totaled (Clue Related Items)* Ho item 
was fcund to possessore than one clue* 
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2. Within each category, the number of Clue Related Items which 
could be answered correctly by using the :lue wes totaled (Observed 
Correct). 

3. The total rf Clue Related Itexs within each category was 
multiplied by 25 percent, based on a four alternative item, to de- 
termine how many items could be answered correctly by chance. For 
those items associated with the clue "select the fifth alternative, n 
the total was multiplied by 20 percent (Expected Correct). 

4. The difference between the Observed Correct and Expected 
Correct answers within each category was tested using the normal curve 
approximation binomial test. 

5. For the clue "when in doubt select alternative f C," Expected 
Correct was calculated from the total number of items analyted in the 
study (1,220), .25 x 1,169 (four alternative items) and . 20 x 51 
(five alternative items). 

6. The binor*5 al test wrs computed to determine the difference 
between the total number of Observed Correct *r.& Expected Correct 
answers for all clues* The clue "when in dcubt select alternative 
f C f " was exluded because this clue overlaps all other clues. 

RESULTS 

The research Question was concerned with" how often test-wiseness 
clues appear in college teacher-cade cul tipie-choice tests and to 
whftt extent these clues are associated with correct answers. Table 1 
shows that out cf a total of 1*220 nultiple-choice itens analyied, 
539 (44J5) contained a test-wiseness clue. Three hundred and seventy- 

( PLACE TABLE 1_ ABOOT H£SS ) 

8 



TABLE \ 

Tes t-Wisonoss Clues: Their Frequency 
And Probability in a Sample of College Tonchor-Mado Tests (N=4 3) 



Clue 


Clue Related 
Item 


Observed 
Correct * (%) 


Expected 
Correct (X) 


z-Sco re 


p Value 


1 . Longest 


5* 


41 (.76) 


13.5 < .25) 


8.49 


p<. 001 


2 . Middle Value 


79 


65 (.82) 


19 . 75< . 25) 


11.62 


p<. 001 


3. Kuy Word 

A t> iJO C irt c io n 


38 


38 <l. 00) 


9.5 <.25) 


10.49 


p<. 001 


A . Specific 

1) c t c rtn L n e r ( T r u c ) 


52 


18 <.35) 


13.0 (.25) 


1.44 


p<.07 5 


5 . Direct 

0 p i> o s i t i* s 


I5l 


115 (.76) 


37.75< . 25) 


14. 4 2 


p<. 001 


d • no.se ucn u r *i i_ 




£. f ft 1 


I U • U \ * Z J } 


ft T fi 

O.JO 


p \ . U U J. 


7. Grninnn t tcnl 
Agreement 


0 










8 . All o ; t h** A l>ovti 


69 


3 7 ; { . 5 A ) 


17 . 75 ( . 25) 


5.34 


p<. 001 


9. FiTch Alternative 


51 


25 (.54) 


10.2 (.20) 


5.00 


p<.001 


Summary of AM CLuoa 


539 


375 (.70) 


132 . 2 < . 25) 
{ .20)* 


24. 10 


p<. 001 


10. Alternative "C" 


1220 


318 (.26) 


302.45(.25) 
(.20)* 


1.02 


p<. 154 



*For five a Iter native items (N =: 51) 
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five (70ft) of these Items could be answered correctly by applying a 
clue* Compared with the chance of choosing the correct answer ty 
gueesing (132.2 items, or .25 x 468 and .20 x 51), 375 represented 
a highly significant number of items (i= 24.10, p< .001). 

Table 1 alao reveals that ©11 but two of the clues were sig- 
nificantly associated with the correct answer (p < .001). 

The clue discovered most often was "Direct OppGsites." Found 
in 151 items* one of the two opposite alternatives could be answered 
correctly in 115 cases. It may be due to the e*ise -with which to write 
an alternative directly opposite of the correct answer that college 
teachors use this strategy so often to generate response options 
for multiple^choice itenis. 

The overall most successful clue was "Key Word Association. " 
Although this clue oid not appear often (355 of all 1*220 items), 
there was a one-to-one correspondence betwoea the clue and the correct 
answer. 

The clue related to grammatical agree-est between the sten and 
one of the alternatives was not found at all- Professors are apparently 
well fcware of this rather cbvioUs clue. 

The clue "Alternative f C IH was cf no r*o + -e help in chocsir.g the 
correct answer then chance guessing.. This suggests that teachers 
are either making an effort to distribute trw correct answer equally 
ar.ong the alternatives, or they are nindfiil of at least another possible 
test^wiseness strategy their s-udents mi^ht *?-rloy. 

There was no significant difference between Observed Correct 
and Expected Correct for the clue "Specific Determiner for True." 
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On a practical level, however, a student could have obtained five 
more correct answers by applying the clue instead of guessing* 

CONCLUSIONS AMD RECOlfMEXDA? ICNS 

It was observed that most of the test-wi$er,ess clues analyzed 
in this study were highly significant predictors cf the correct 
response for a given test item. This fact points up the need for 
college teachers to give more consideration to avoiding these clues 
v/hen developing their examinations. This is important if teachers 
desire to avoid penalizing the unsophisticated test-taker as well as 
increase the chsr.ee that the test is a reliable measurement of 
students* classroom learning* 

Due to the limited sample *of university and college teacher* 
made tests in this study, generalization of these findings to all 
faculty cannot be madd Nonetheless, this research provides some 
important direction for college personnel responsible for student 
academic improvement and faculty development. 

first, based on the design of this st-cy. a university-specif ic 
evaluation of faculty-constrjcted multiple-choice tests should be 
nade. If test-^iseness clues to correct answers appear in significant 
numbers, then item writing training workshops shojld be made available 
to faculty for improving the reliability of their examinations. 

ifehrens and Lehman (1S73) characterize a "gocd" test writer as 
having a thorough understanding of subject setter and the pupils 
being tested aa veil as the ability to use creativity in writing 
objective questions while avoiding clues to the correct answers. 
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Other educational measurement specialists (Anderson. 1977; B^ocm^ et. 
al* t 1971; Gronlund* 1S7J) have emphasized that writing adequate 
multiple-choice items requires tine, practice, and a checklist of 
common errors that should be avoided. A checklist should include 
general questions about multiple-choice test item writing (Anderson/ 



Does the sten introduce what is expected of the student? 

Is the stem free from irrelevant material? 

Are all of the alternatives plaJsible and homogeneous? 

Have you avoided repeating words or phrases in the alternatives? 

Have you avoided overlapping alternatives? 

Are all alternatives approximately the same length? 

If you use the incomplete statement format, does the alternative 
cone at the end of the stater^nt? 

The findings of this study suggest that a checklist should also 
include detailed questions abcut test-^iseness clues* The following 
are some examples of questions cased on clues reported here. 

Is the longest alternative the correct answer? 

Is the alternative of middle value the correct ansv/er? 

If there are direct opposites among the alternatives, is one the 
correct answer? 

Are there any key words in the st*m reaiily associated with the 
same word or a synonym in the alternatives? 

Clues to the correct answers in multiple-choice tests -ay never 
be totally eliminated. Nevertheless, the checklist above represents 
a start in the development of £ more exhaustive checklist that can be 
used in item writing workshops to improve a allege teacher's ability 



1977): 
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to write cultiple-chcice items. Additions to the list will come from 
the teacher's own experiences with .nultiple-choice tests as well as 
from test-writing experts* 

The study also points out the continued value cf teaching test- 
wiseness to students who are unsophisticated in its use. This m&y 
be particularly true for students ncz fasilisr with nuliiple-choice 
tests (foreign students, speciAl admission, students) in order to 
avoid the unfair advantage other more sophisticated students may have 
over them in test taking skills. In addition, teaching test-wiseresa 
clues to test-anxious students say help lower their anxiety. Support 
for this idea can be found from a number of researchers who brought 
about a decrease in test anxiety and an improvement in other personality 
dimensions through academic skills training (Garrison, 1971; Johnson, 
1975; Long, 1972; Roth, 1969; Ssyles, 1965)- Often a student*s Tear 
of tests is rooted in a lack of confidence about his/her ability to 
take tests* This interaction between personality and skills should 
be a prinary concern of college learning center personnel. 

According to Anderson, the ideal classroom measurement situation 
occurs when all test-takers are equally sophisticated and the test 
writer has designed an instru^e.nt as error free as possible. In 
this case, the chance for reliable xeasurer-ent is greatly enhanced. 
College academic assistance personnel are in the uni position 
of being able 'to provide programing for faculty and students alike 
tthich encourages reliability ir. classreen testing. 
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